(CELF Technical Jamboree#9)

Challenge to Improve Performance of Linux Mobile Phone

July, 13, 2006
M. Fukunaga, technical manager, Mobile Terminal Development Division, NEC
(translated by ikoma)
---------------------------- = ----------------------------
table of contents

 1. History of performance improvement of NEC mobile phones
 2. Issues on performance
   2.1 System start up performance
   2.2 Screen switching performance
 3. Measures to improve performance
   3.1 Approaches to better performance
   3.2 Faster JFFS2 mounting
   3.3 Faster process invocation
 4. Summary

---------------------------- = ----------------------------
1. History of performance improvement of NEC mobile phones

  * performance: almost ten times faster system start up
  * function: voice communication only  -> TV phone/browser/Java

 (see chart in original pdf)
    initial trial (X + 3 apps) (02/autumn)
    baseline trial (telephony feature only) (03/spring)
    final trial (telephony, browser, WLAN)  (03/autumn)
    FOMA N900iL(first linux model)
    FOMA N901iC(second linux model)
    FOMA N902i(third linux model)
    FOMA N900i(RTOS model)
---------------------------- = ----------------------------
2. Issues on performance
  * Start up time
    - 90 sec at first trial (spring, 03)
    - required to reduce to 10 sec for product
  * Screen switching time between applications
    - 5-10 sec at first trial (spring, 03)
    - required to reduce to less than 1 sec for product
---------------------------- = ----------------------------
 2.1 System Start Up Time
                             03spring   target
   power on
   |  OS start up             11sec  ->  2sec
   |  middleware/app start    56sec  ->  8sec
   V  (total                  67sec  -> 10sec)
   standby screen


   details:
      OS start up(11 sec):
          init(20%)
          /bin/mount(17%)
          /bin/insmod(12%)
          /etc/init.d/rc.S(10%)
          syslogd(10%)
          /bin/sh
          .....

      Middleware/app start up(56 sec):
          x server(19%)
          kswapd(10%)
          browser(9%)
          stand-by(9%)
          .....
---------------------------- = ----------------------------
 2.2 screen switching performance evaluation

    standby -> menu   1.7sec
    standby -> phone  2.5sec
    menu -> phone book 3.9sec
    menu -> setting    4.2sec
    phone book -> name input  1.1sec

     all targets are less than 1 sec

  analysis of application start up performance

    app(menu->phone book)
         -----------------------
         library initialization
         font load/initialization
         application processing


    resident app(stand by->phone book)
         library initialization
         font load/initialization
         -----------------------
         application processing

---------------------------- = ----------------------------
3. Measures to improve performance
---------------------------- = ----------------------------
 3.1 Approaches to better performance

    * start up time(OS start up)
        kernel tuning, suppressing log, faster mount of file system
    * start up time(middleware/resident app)
        faster process invocation, introducing non-resident process,
        eliminating swap, faster rendering
    * screen transition time(standby->menu)
        reducing window components, faster rendering

---------------------------- = ----------------------------
 3.2 Faster JFFS2 mounting
---------------------------- = ----------------------------
  (1) JFFS2 File System
    Features of JFFS2 file system
      (http://www.linux-mtd.infradead.org)
    * file system for flash ROM
    * log structured file system
      - stores and organizes multiple nodes as a log
      - each node manages file information (data, file names, attributes etc.)
    * file data compressed and managed
      - 40% less data size than ext2
    * tables (i-node table etc.) created at mount time
      - file management area created on RAM at mount time
      - prevent "the destruction of file system" even power shut off during writing data

---------------------------- = ----------------------------
  (2) File Access through JFFS2

       (see figure)

       longer time of file system mounting!
---------------------------- = ----------------------------
  (3) Mount time of JFFS2

       (see graph)
            JFFS2 processing
            MTD processing
            ECC calculation
            Chip access

   * The more files, the longer mount time

---------------------------- = ----------------------------
  (4) Faster JFFS2 mounting

   * Optimization by judging unused block
   * Reading only file management information
   * Simplifying error correction
   * Improving error correction algorithm

---------------------------- = ----------------------------
  Optimization by judging unused block

   * Providing a flag (free block maker) for each block to indicate existence of node, omit reading block which has no node

          (see figure)

             Nodes(variable length)
             Block(16KB)
             Spare Area(16B x 32)
---------------------------- = ----------------------------
  Reading node header only 

   * Read node headers only which are necessary for mounting
      - Original JFFS2 read all area of FROM, and built file management information
      - For mount operation, reading node header at the top of each node is enough
             (see figure)
---------------------------- = ----------------------------
  Simplifying error correction

   * Omit read and calculation of ECC at MTD

     - Had been doubly checked: In FROM, ECC is stored in spare area for error correction on data; JFFS2 stores CRC in node header

     - JFFS2 now reads data without ECC; If CRC error occurs in JFFS2, then reread with ECC and correct data
---------------------------- = ----------------------------
  Improving error correction algorithm

    * replaced the ECC calculation algorithm with faster one
      - replaced the ECC processing of MTD driver with faster one from YAFFS file system

    * optimized computation
      - optimized ECC computation itself, by expanding loops
      - tried various samples and chose the fastest expansion method
---------------------------- = ----------------------------
  (5) Improved result of JFFS2 mount operation
     (see graph)

     When number of files is 5000,
        8.65 sec -> 2.39 sec

                JFFS2 processing
                MTD processing
                ECC calculation
                Chip access
---------------------------- = ----------------------------
 3.3 Faster process invocation
---------------------------- = ----------------------------
  (1) Applying prelinking
    * What is prelink?  (http://tree.celinuxforum.org/CelfPubWiki/PreLinking)
      - Resolving DLL linking information when building executable file
      - Skip linking of unresolved symbols at loader
    * Effect
      - fork: 2,110ms -> 169ms (80% reduced)
      - the more DLLs an application links, the larger effect 
    * Issues on applying
      - "-fPIC" options required at building object
      - When DLL modified, all executable files it links should be re-prelinked
      - 20-30% larger object size
---------------------------- = ----------------------------
  (2) Improving performance of glibc
    * Simplifying setlocale
      - Usually: Read type of characters, format definition of time etc. from file and store them onto locale table
      - -> Store only necessary information onto locale table
    * Effect
      - processing time: 134ms -> 2ms (98% reduced)
---------------------------- = ----------------------------
4. Summary
---------------------------- = ----------------------------
  Measures and Effects
  * Improvement of OS start up
    - Focused on JFFS2 mount performance and attack
    - 8.65 sec -> 2.39 sec (when 5000 files)
  * Improvement of process invocation
    - Applied prelink: 2,110msec -> 169msec (fork)
    - Simplified setlocale: 134msec -> 2msec
---------------------------- = ----------------------------
Empowered by Innovation
NEC
---------------------------- = ----------------------------
